Supermarket Sales Analysis Project - Profile Report¶

In [16]:
import pandas as pd
from ydata_profiling import ProfileReport
In [17]:
# read data
sales_data = pd.read_csv('data/supermarket_sales.csv', delimiter=',', decimal='.')
In [18]:
sales_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 17 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Invoice ID               1000 non-null   object 
 1   Branch                   1000 non-null   object 
 2   City                     1000 non-null   object 
 3   Customer type            1000 non-null   object 
 4   Gender                   1000 non-null   object 
 5   Product line             1000 non-null   object 
 6   Unit price               1000 non-null   float64
 7   Quantity                 1000 non-null   int64  
 8   Tax 5%                   1000 non-null   float64
 9   Total                    1000 non-null   float64
 10  Date                     1000 non-null   object 
 11  Time                     1000 non-null   object 
 12  Payment                  1000 non-null   object 
 13  cogs                     1000 non-null   float64
 14  gross margin percentage  1000 non-null   float64
 15  gross income             1000 non-null   float64
 16  Rating                   1000 non-null   float64
dtypes: float64(7), int64(1), object(9)
memory usage: 132.9+ KB
In [19]:
sales_data.describe()
Out[19]:
Unit price Quantity Tax 5% Total cogs gross margin percentage gross income Rating
count 1000.000000 1000.000000 1000.000000 1000.000000 1000.00000 1000.000000 1000.000000 1000.00000
mean 55.672130 5.510000 15.379369 322.966749 307.58738 4.761905 15.379369 6.97270
std 26.494628 2.923431 11.708825 245.885335 234.17651 0.000000 11.708825 1.71858
min 10.080000 1.000000 0.508500 10.678500 10.17000 4.761905 0.508500 4.00000
25% 32.875000 3.000000 5.924875 124.422375 118.49750 4.761905 5.924875 5.50000
50% 55.230000 5.000000 12.088000 253.848000 241.76000 4.761905 12.088000 7.00000
75% 77.935000 8.000000 22.445250 471.350250 448.90500 4.761905 22.445250 8.50000
max 99.960000 10.000000 49.650000 1042.650000 993.00000 4.761905 49.650000 10.00000
In [20]:
profile = ProfileReport(sales_data, title="Pandas Profiling Report")
profile
Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]
Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]
Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]
Out[20]:

In [ ]:
 
In [ ]: